This lab will be submitted in pairs (if you don’t have a pair, please contact us) via the submission link in moodle.
Your final submission should include two files: an
Rmd file (with your answers filled-in) and an
html file that was generated automatically by knitting the
Rmd file using knitr. Name your files as
<ID1>_<ID2>.Rmd and
<ID1>_<ID2>.html (insert your ID numbers
instead).
Grading: There are \(8\) questions with overall \(15\) sub-questions. Each sub-question is
worth \(6\frac{2}{3}\) points to the
overall lab grade. The questions vary in length and difficulty level. It
is recommended to start with the simpler and shorter questions. Points
may be reduced for incorrect naming of files, missing parts and problems
in knitting the Rmd file and general appearance of the
report.
Libraries: The only allowed libraries are listed below (do not add additional libraries without permission from the course staff):
library(tidyverse) # This includes dplyr, stringr, ggplot2, ..
library(data.table)
library(rworldmap) # world map
library(ggthemes)
library(reshape2) # melt: change data-frame format long/wide
library(e1071) # skewness and kurtosis
library(rvest)
library(corrplot)
library(moments)
library(spatstat.geom)
The wikipedia/Democracy_Index
website hosts world-wide data on different measurements of democracy
index for world countries. For more information about it, please visit
here.
We will focus on analyzing the changes in the index in different countries, as well as the individual components comprising the index, and comparison to other datasets.
Your solution should be submitted as a full report integrating text, code, figures and tables. For each question, describe first in the text of your solution what you’re trying to do, then include the relevant code, then the results (e.g. figures/tables) and then a textual description of them.
In most questions the extraction/manipulation of relevant parts
of the data-frame can be performed using commands from the
tidyverse and dplyr R packages, such as
head, arrange, aggregate,
group-by, filter, select,
summaries, mutate etc.
When displaying tables, show the relevant columns and rows with meaningful names, and describe the results.
When displaying figures, make sure that the figure is clear to the reader, axis ranges are appropriate, labels for the axis, title and different curves/bars are displayed clearly (font sizes are large enough), a legend is shown when needed etc. Explain and describe in text what is shown in the figure.
It could be that in some cases data are missing
(e.g. NA). Make sure that all your calculations
(e.g. taking the maximum, average, correlation etc.) take this into
account. Specifically, the calculations should ignore the missing values
to allow us to compute the desired results for the rest of the values
(for example, using the option na.rm = TRUE or
us = "complete.obs").
R object, using the rvest
package. List by region, List by country and
components into three separate R data-frames.
Display the top five rows of each table to check that they were loaded
correctly. top five countries
in terms of the democracy index in 2022. Show only the country
name and the democracy index. bottom countries in 2022. List by country.democracy index in 2022 of
the different world regions given in the List by country
table (each boxplot should represent the distribution of all countries
within a specific region). boxplot.stats command). democracy index in 2022 in the seven different
regions. Do the densities resemble to the Normal distribution? Compute
the mean, variance, skewness and
kurtosis for all the distributions, display them in a table and
explain what they mean about the empirical distribution of the
data.Write a function that receives as input a data-frame, and a
vector of country names (as strings). The function plots the values of
the democracy index of these countries in different colors
as a function of the year (from 2006 to 2022), shown
on the same graph as curves with different colors or symbols. Use
meaningful axis and plot labels, and add an informative legend. Use the
function and plot of the democracy index for five countries
of your choice.
Use the same function for the table
List by region where the seven region names as inserted as
input instead of countries, to show changes in the world
regions democracy index over time.
Divide the countries into eight separate groups (clusters) as follows:
Remark: Don’t worry if some of the groups you get are large with countries with very similar colors, and/or a small graph panel due to a large legend.
Change in category:
For each of the four
different regime types (Full democracy,
Flawed democracy, Hybrid regime,
Authoritarian), use the countries democracy index data
frame to estimate the probability of a country to go from one such a
regime in \(2006\) to each of the other
four regimes in \(2022\). Show the
results (sixteen estimated probabilities) in a \(4\)-by-\(4\) table, and also in a heatmap.
Remarks: Your estimates should simply be the empirical
frequencies - for example, if \(2\) out
\(20\) countries moved from
Authoritarian in \(2006\)
to Hybrid regime in \(2022\), then get an estimate of \(0.2\) for the probability of such a regime
change).
Use the table By regime type from the
democracy index webpage to determine the regime type category based on
the democracy index value.
Joining data from additional tables:
rvest library. R
data-frame. democracy index at \(2022\) as the predictor and
GDP (PPP) per capita (use the CIA estimates) as the
response, and report the regression results. GDP (PPP) per capita (y-axis) vs. the
democracy index at \(2022\), with the fittedthe regression line.
Describe your results. incarnation rate (per 100,000) as the responseEmpirical Cumulative Distribution Function (CDF):
GDP (PPP) per capita of a randomly
selected country in 2022, where countries are selected
uniformly at random from all world countries. Compute and plot the
empirical CDF of \(X\). GDP (PPP) per capita of a randomly
selected person in the world in 2022, where a person is
selected uniformly at random from all world population. Compute and plot
the empirical CDF for \(Y\) and explain
the differences from the distribution for \(X\). Remark: Use the
population size data to compute the empirical CDF for this case. It is
possible to use the library spatstat.geom.GDP (PPP) per capita of a randomly
selected person in the world in 2022, where the
location of the person is selected uniformly at random from all
the land area on earth. Compute and plot the empirical CDF for \(Y\) and explain the differences from the
distribution for \(X\). Compare the
median, and the \(25\%\) and \(75\%\) percentiles of \(X,Y\) and \(Z\). Are they similar or different?
explain. Remark: Use the countries land area (in \(km^2\) or \(mi^2\)) to compute the empirical CDF for
this case. You will need to parse the corresponding column to get the
numerical data.Displaying data on the world map:
Use the
rworldmap package to display the world map and color each
country based on the average democracy index across the
years from \(2006\) to \(2022\). Describe the resulting map in a
couple of sentences.
Next, repeat all parts above , but this time
display in the map the difference in the index between
\(2022\) and \(2006\).
Guidance: Use the joinCountryData2Map
and mapCountryData commands to make the plots. Keep
countries with missing data in white.
Coponents of the Demography Index:
components table with the main table from the
previous questions. Display the top five rows. Next, compute the
correlation between all pairs of the five democracy components
(Electoral process and pluralism,
Functioning of government,
Political participation, Political culture and
Civil liberties), and plot the resulting \(5\)-by-\(5\) correlations matrix in a heatmap. (It
is possible to use the corrplot library). components table,
and the response variable that you try to predict the
GDP (PPP) per capita of each country. GDP (PPP) per capita?Good luck!
Solution: (Fill code, text, plots etc.)
1.a. Loading the data via URL connection:
democracy <- read_html("https://en.wikipedia.org/wiki/Democracy_Index")
all.tables = html_nodes(democracy, "table")
# Use html_table to extract the individual tables from the all.tables object:
categories <- as.data.frame(html_table(all.tables[3], fill = TRUE)) # Example
#based on the example we called all the relevant tables in order and names them accordingly
list_by_region <- as.data.frame(html_table(all.tables[4], fill = TRUE))
list_by_region22 <- as.data.frame(html_table(all.tables[5], fill = TRUE))
list_by_country <- as.data.frame(html_table(all.tables[6], fill = TRUE))
components <- as.data.frame(html_table(all.tables[7], fill = TRUE))
# we'll call the top five of the required variances
head(categories)
## Type.of.regime Score Countries
## 1 Type of regime Score Number
## 2 Full democracies 9.01–10.00  8.01–9.00 24
## 3 Flawed democracies 7.01–8.00  6.01–7.00 48
## 4 Hybrid regimes 5.01–6.00  4.01–5.00 36
## 5 Authoritarian regimes 3.01–4.00  2.01–3.00   1.01–2.00   0.00–1.00 59
## Countries.1 Proportion.ofWorld.population....
## 1 (%) Proportion ofWorld population (%)
## 2 14.4% 8.0%
## 3 28.7% 37.3%
## 4 21.6% 17.9%
## 5 35.3% 36.9%
head(list_by_region)
## Region Coun.tries X2022 X2021 X2020 X2019 X2018
## 1 North America 2 8.37 8.36 8.58 8.59 8.56
## 2 Western Europe 21 8.36 8.23 8.29 8.35 8.35
## 3 Latin America and the Caribbean 24 5.79 5.83 6.09 6.13 6.24
## 4 Asia and Australasia 28 5.46 5.46 5.62 5.67 5.67
## 5 Central and Eastern Europe 28 5.39 5.36 5.36 5.42 5.42
## 6 Sub-Saharan Africa 44 4.14 4.12 4.16 4.26 4.36
## X2017 X2016 X2015 X2014 X2013 X2012 X2011 X2010 X2008 X2006
## 1 8.56 8.56 8.56 8.59 8.59 8.59 8.59 8.63 8.64 8.64
## 2 8.38 8.40 8.42 8.41 8.41 8.44 8.40 8.45 8.61 8.60
## 3 6.26 6.33 6.37 6.36 6.38 6.36 6.35 6.37 6.43 6.37
## 4 5.63 5.74 5.74 5.70 5.61 5.56 5.51 5.53 5.58 5.44
## 5 5.40 5.43 5.55 5.58 5.53 5.51 5.50 5.55 5.67 5.76
## 6 4.35 4.37 4.38 4.34 4.36 4.33 4.32 4.23 4.28 4.24
head(list_by_country)
## Region X2022.rank Country Regime.type X2022 X2021 X2020
## 1 North America 12 Canada Full democracy 8.88 8.87 9.24
## 2 North America 30 United States Flawed democracy 7.85 7.85 7.92
## 3 Western Europe 20 Austria Full democracy 8.20 8.07 8.16
## 4 Western Europe 36 Belgium Flawed democracy 7.64 7.51 7.51
## 5 Western Europe 37 Cyprus Flawed democracy 7.38 7.43 7.56
## 6 Western Europe 6 Denmark Full democracy 9.28 9.09 9.15
## X2019 X2018 X2017 X2016 X2015 X2014 X2013 X2012 X2011 X2010 X2008 X2006
## 1 9.22 9.15 9.15 9.15 9.08 9.08 9.08 9.08 9.08 9.08 9.07 9.07
## 2 7.96 7.96 7.98 7.98 8.05 8.11 8.11 8.11 8.11 8.18 8.22 8.22
## 3 8.29 8.29 8.42 8.41 8.54 8.54 8.48 8.62 8.49 8.49 8.49 8.69
## 4 7.64 7.78 7.78 7.77 7.93 7.93 8.05 8.05 8.05 8.05 8.16 8.15
## 5 7.59 7.59 7.59 7.65 7.53 7.40 7.29 7.29 7.29 7.29 7.70 7.60
## 6 9.22 9.22 9.22 9.20 9.11 9.11 9.38 9.52 9.52 9.52 9.52 9.52
head(components)
## Rank
## 1
## 2 Full democracies
## 3 1
## 4 2
## 5 3
## 6 4
## .mw.parser.output..tooltip.dotted.border.bottom.1px.dotted.cursor.help.Δ.Rank
## 1
## 2 Full democracies
## 3
## 4
## 5 2
## 6
## Country Regime.type Overall.score Δ.Score
## 1
## 2 Full democracies Full democracies Full democracies Full democracies
## 3 Norway Full democracy 9.81 0.06
## 4 New Zealand Full democracy 9.61 0.14
## 5 Iceland Full democracy 9.52 0.34
## 6 Sweden Full democracy 9.39 0.13
## Elec.toral.pro.cessand.plura.lism Func.tioningof.govern.ment
## 1
## 2 Full democracies Full democracies
## 3 10.00 9.64
## 4 10.00 9.29
## 5 10.00 9.64
## 6 9.58 9.64
## Poli.ticalpartici.pation Poli.ticalcul.ture Civilliber.ties
## 1
## 2 Full democracies Full democracies Full democracies
## 3 10.00 10.00 9.41
## 4 10.00 8.75 10.00
## 5 8.89 9.38 9.71
## 6 8.33 10.00 9.41
1.b.
# Select the top 5 countries with the highest democracy index in 2022
top <- select(list_by_country,Country,X2022) %>% arrange(desc(X2022)) %>% head()
# Select the bottom 5 countries with the lowest democracy index in 2022
bottom <- select(list_by_country,Country,X2022) %>% arrange(X2022) %>% head()
# Display the top and bottom countries
top
## Country X2022
## 1 Norway 9.81
## 2 New Zealand 9.61
## 3 Iceland 9.52
## 4 Sweden 9.39
## 5 Finland 9.29
## 6 Denmark 9.28
bottom
## Country X2022
## 1 Afghanistan 0.32
## 2 Myanmar 0.74
## 3 North Korea 1.08
## 4 Central African Republic 1.35
## 5 Syria 1.43
## 6 Democratic Republic of the Congo 1.48
# Calculate the average democracy index for each country
averages <- list_by_country %>%
mutate(avg = rowMeans(select(., 5:19), na.rm = TRUE)) %>%
select(Country, avg)
# Select the top 5 countries with the highest average democracy index and the bottom 5 countries with the lowest average democracy index
top_averages<- averages %>% arrange(desc(avg)) %>% head()
bottom_averages<- averages %>% arrange(avg) %>% head()
# Display the top and bottom countries based on average democracy index
top_averages
## Country avg
## 1 Norway 9.830667
## 2 Iceland 9.562000
## 3 Sweden 9.524667
## 4 Denmark 9.305333
## 5 New Zealand 9.268667
## 6 Finland 9.140667
bottom_averages
## Country avg
## 1 North Korea 1.062000
## 2 Chad 1.569333
## 3 Central African Republic 1.581333
## 4 Syria 1.700667
## 5 Turkmenistan 1.741333
## 6 Democratic Republic of the Congo 1.808000
The top five countries in terms of the democracy index in 2022 are: Norway, New Zeland, Iceland, Sweden, Finland and Danmark
The bottom five countries in terms of the democracy index in 2022 are: Afghanistan, Myanmar, North Korea, Central African Republic, Syria and Democratic Republic of the Congo
The top five countries according to the average index value of all the 15 years are: Norway, Iceland, Sweden, Danmark New Zeland, Finland. We can see that they are the same countries than the top five in 2022, just in a different order.
The top five countries according to the average index value of all the 15 years are: North Korea, Chad, Central African Republic, Syria, Turkmenistan, Democratic Republic of the Congo.
2.a.
# Create a boxplot showing the distribution of democracy index in 2022 by region
ggplot(list_by_country, aes(x = Region, y = `X2022`, fill = Region)) +
geom_boxplot() +
xlab("Region") +
ylab("Democracy Index") +
ggtitle("Boxplots by Region") +
theme(axis.title.x = element_text(margin = margin(t = 10)),
axis.text.x = element_text(angle = 45, hjust = 1.1, vjust = 1),
plot.title = element_text(face = "bold", hjust = 0.5))
# Find the outliers for each region
outliers <-list_by_country %>% group_by(Region) %>% summarise(outliers = paste(boxplot.stats(`X2022`)$out, collapse = ", "))
# Display the outliers for each region
outliers
## # A tibble: 7 × 2
## Region outliers
## <chr> <chr>
## 1 Asia and Australasia ""
## 2 Central and Eastern Europe ""
## 3 Latin America and the Caribbean ""
## 4 Middle East and North Africa "7.93"
## 5 North America ""
## 6 Sub-Saharan Africa ""
## 7 Western Europe "4.35"
We can see that for the region of Asia and Australasia, this is the region that have the biggest difference between the minimum index and the maximum, a large step may indicate that the data within that group exhibit a wide range of values. It has a median of approximately 6.25, the lower percentile at approximately 3.75 and the upper percentile at approximately 7.25. The majority of countries in this region are Flawed Democracys. It does not have any outliers.
For the Central and Eastern Europe region, the width of the box looks like the precedent region but their gap between the maximum and the minimum is smaller. It’s median is approximately 6.25, the lower percentile at approximately 3.60 and the upper percentile at approximately 7.The majority of countries in this region are Flawed Democracy. It does not have any outliers. This region and the precedent are very alike.
For the Latin America and the Caribbean, the width of the box is a lot smaller than the two precedents regions, the small width indicates that the values are clustered closely together and have limited variability, but the gap between the minimum index and the maximum is big. It has a median of approximately 6.25, the lower percentile at approximately 5 and the upper percentile at approximately 7.1. The majority of countries in this region are Flawed Democracy. It does not have any outliers.
For the Middle east and North Africa, the width of the box is smaller than the precedent region, the small width indicates that the values are clustered closely together and have limited variability, the gap between the minimum index and the maximum is also not big. It’s median is the lowest of every regions and is approximately 3.1, the lower percentile at approximately 2.5 and the upper percentile at approximately 3.75. The majority of countries in this region are Authoritarian regimes. It does have one outlier at approximately 7.93.
For the North America, the width of the box is the smaller than every other regions, that is because it contains only two countries, and these two countries look alike as a term of the type of regime. The median is approximately 8.2 and it lower percentile, upper percentile, maximum and minimum are all between 7.75 and 9.The two countries seem to be or Flawed democracy or Full democracy. It does not have any outliers.
For the Sub-Saharan Africa, the width of the box looks like the firsts two regions, this suggests that there is considerable variability in the variable being measured within that region. It’s median is approximately 3.75, the lower percentile at approximately 3.1 and the upper percentile at approximately 5.3. The democracy indexes can go from 1.25 to 8. The majority of countries in this region are Authoritarian regimes. It does not have any outlier.
For the Western Europe, the width of the box is smaller than the precedent region, the small width indicates that the values are clustered closely together and have limited variability, the gap between the minimum index and the maximum is also not big, can go from 7.4 to 9.9. It’s median is approximately 8, the lower percentile at approximately 7.8 and the upper percentile at approximately 8.9. The majority of countries in this region are Full democracy. This region looks a lot like the North America’s region. It does have one outlier at approximately 4.35.
2.b.
# Create a density plot to visualize the distribution of the democracy index in 2022 by region
ggplot(list_by_country, aes(x = `X2022`, fill = Region)) +
geom_density(alpha = 0.2) +
xlab("Democracy Index 2022") +
ylab("Density") +
ggtitle("Density Plot by Region") +
theme(legend.position = "top")
# Compute summary statistics for each region
summary_table <- list_by_country %>%
group_by(Region) %>%
summarize(
Mean = mean(`X2022`),
Variance = var(`X2022`),
Skewness = moments::skewness(`X2022`),
Kurtosis = moments::kurtosis(`X2022`)
)
summary_table
## # A tibble: 7 × 5
## Region Mean Variance Skewness Kurtosis
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Asia and Australasia 5.46 6.68 -0.529 2.32
## 2 Central and Eastern Europe 5.39 4.23 -0.649 1.99
## 3 Latin America and the Caribbean 5.79 3.40 -0.504 2.52
## 4 Middle East and North Africa 3.34 2.22 1.54 5.65
## 5 North America 8.36 0.530 0 1
## 6 Sub-Saharan Africa 4.14 3.18 0.506 2.37
## 7 Western Europe 8.36 1.36 -1.86 7.64
The right-skewed regions are Middle East and North Africa and Sub-Saharan Africa. The left-skewed regions are Asia and Australasia,Central and Eastern Europe, Latin America and the Caribbean and Western Europe.
In our case all the regions Kurtosis are greater than 0, therefor they all have a higher peak compared to the normal distribution.
As we saw in the precedent question, the region Asia and Australasia has a big variance and on the contrary the region North America has a very low variance. Furthermore, we can conclude that in average the countries of the regions Asia and Australasia, Central and Eastern Europe, Latin America and the Caribbean and Sub-Saharan Africa have an hybrid regime; on average the countries in the Middle East and North Africa have an Authoritarian regime and the countries in North America and Western Europe have on average Full democracies.
3.a.
# Define a function to compare democracy index across countries
Comparing_countries <- function(data, country_names) {
# Check if the data has a column named "Country", otherwise use "Region"
if ("Country" %in% colnames(data)) {
place_col <- "Country"
} else {
place_col <- "Region"
}
# Extract the year columns from the data
year_cols <- grep("^X[0-9]{4}$", colnames(data), value = TRUE)
years <- unique(as.numeric(substr(year_cols, 2,5)))
# Filter the data for the specified country names
fil_data <- data[data[[place_col]] %in% country_names, ]
color <- rainbow(length(country_names))
# Create an empty plot with appropriate axes labels and title
plot(NULL, xlim = range(years), ylim = c(0, 10),
xlab = "Year", ylab = "Democracy Index",
main = "Democracy Index for Countries")
# Set the x-axis ticks to the years
axis(1, at = years, labels = years)
# Loop through each country and plot its democracy index over the years
for (i in 1:length(country_names)) {
place <- country_names[i]
country_data <- fil_data[fil_data[[place_col]] == place, ]
lines(years, t(country_data[year_cols]), type = "l", col = color[i])
}
# Add a legend to the plot showing the country names and corresponding colors
legend("bottomright", inset = 0.02, legend = country_names,
col = color, pch = 19, bty = "n")
}
# Call the Comparing_countries function with a specific set of country names
Comparing_countries(list_by_country, c("France", "United States", "Israel", "Cameroon", "Morocco"))
First, we can observe that there is 3 countries that look alike, where their democracy indexes did not change a lot over the years and stayed high: France, United states and Israel. Second, the 2 other countries, Cameroon and Morocco, looked alike from the year 2006 to 2014, but from the year 2015 they changed in the opposite way. Morocco began te be a little bo democratic an Cameroon a little less.
3.b.
# Calculate the difference in democracy index between 2022 and 2006
index_diff <- list_by_country$X2022 - list_by_country$X2006
# First cluster - Countries with a big increase in democracy index
big_increase_cluster <- list_by_country[index_diff >= 1.5, "Country"]
# Second cluster - Countries with a big decrease in democracy index
big_decrease_cluster <- list_by_country[index_diff <= -1.5, "Country"]
# Third cluster - Countries with a small increase in democracy index
small_increase_cluster <- list_by_country[index_diff > 0.75 & index_diff <= 1.5, "Country"]
# Fourth cluster - Countries with a small decrease in democracy index
small_decrease_cluster <- list_by_country[index_diff < -0.75 & index_diff >= -1.5, "Country"]
# Calculate the minimum index for each country
list_by_country$min_index <- apply(list_by_country[, 6:19], 1, min)
# Fifth cluster - Countries with an increase in democracy index compared to the minimum index in 2006
index_min_diff_2006 <- list_by_country$X2006 - list_by_country$min_index
index_min_diff_2022 <- list_by_country$X2022 - list_by_country$min_index
decrease_increase <- list_by_country$Country[index_min_diff_2006 >= 0.75 & index_min_diff_2022 >= 0.75]
# Calculate the maximum index for each country
list_by_country$max_index <- apply(list_by_country[, 6:19], 1, max)
# Sixth cluster - Countries with a decrease in democracy index compared to the maximum index in 2006
index_max_diff_2006 <- list_by_country$X2006 - list_by_country$max_index
index_max_diff_2022 <- list_by_country$X2022 - list_by_country$max_index
increase_decrease <- list_by_country$Country[index_max_diff_2006 <= -0.75 & index_max_diff_2022 <= -0.75]
# Seventh cluster - Countries with minimal change in democracy index (small difference between max and min)
bare_change <- list_by_country$Country[list_by_country$max_index-list_by_country$min_index < 0.5 ]
# Height cluster - Other countries not included in the previous clusters
other_countries <- list_by_country$Country[!(list_by_country$Country %in% c(big_increase_cluster,big_decrease_cluster,small_increase_cluster, small_decrease_cluster, decrease_increase, increase_decrease, bare_change))]
# Call the Comparing_countries function to compare countries within each cluster
Comparing_countries(list_by_country, big_increase_cluster)
Comparing_countries(list_by_country, big_decrease_cluster)
Comparing_countries(list_by_country, small_increase_cluster)
Comparing_countries(list_by_country, small_decrease_cluster)
Comparing_countries(list_by_country, decrease_increase)
Comparing_countries(list_by_country, increase_decrease)
Comparing_countries(list_by_country, bare_change)
Comparing_countries(list_by_country, other_countries)
First cluster: we can see that the three countries began with a small index between 2 and 3 in 2006 and all end up with an index between 3 and 6. Tunisia had the biggest on his pic in 2015.
Second cluster: we can see 13 countries that all finished with a democracy index a lot lower in 2022 than in 2006. The one that knew his biggest drop is Afghanistan.
Third cluster: we can see 14 countries that all finished with a democracy index a a little bit bigger in 2022 than in 2006. For example we can see that Uruguay began with an index a around 7.95 and finished in 2022 with an index around 8.9
Fourth cluster: we can see countries that all finished with a democracy index a a little bit lower in 2022 than in 2006. For example we can see that Myanmar began with an index a around 1.7 and finished in 2022 with an index around 0.7.
Fifth cluster: we can see 7 countries that dropped by at least 0.75 points after 2006 and then recovered by at least 0.75 points in 2022 compared to the lowest drop. For example, Gambia had a democracy index of approximately 4.5 in 2006, knew his lowest drop in 2016 around 3 and then recovered by 2022 at approximately 4.4.
Sixth cluster: similarly, we can see 10 countries that increased by at least 0.75 points after 2006 and then dropped by at least 0.75 points in 2022 compared to their highest point. For example, Libya had a democracy index of approximately 2 in 2006, knew his highest point in 2012 around 5 and then dropped by 2022 at approximately 2.
Seventh cluster: we can see countries that had barely changed from 2006 to 2022, i.e. that the difference between their highest and lowest index was less than 0.5 points. For example Chad had his higher index in 2021 at 1.67 and his lowest index in 2013 at 1.5.
Height cluster: all the rest of the countries that did not fit on any of the seven clusters.
# Step 1: Filter the data table for 2006 and 2022
data_for_freq <- select(list_by_country,Country,X2006,X2022)
# Initialize empty columns for regime types
data_for_freq$Regime_Type_2006 <- NA
data_for_freq$Regime_Type_2022 <- NA
# Assign regime type based on score conditions for 2006
data_for_freq$Regime_Type_2006 <- ifelse(data_for_freq$X2006 >= 8.01 & data_for_freq$X2006 <= 10, "Full democracies",
ifelse(data_for_freq$X2006 >= 6.01 & data_for_freq$X2006 <= 8, "Flawed democracies",
ifelse(data_for_freq$X2006 >= 4.01 & data_for_freq$X2006 <= 6, "Hybrid regimes",
ifelse(data_for_freq$X2006 >= 0 & data_for_freq$X2006 <= 4, "Authoritarian regimes", NA))))
# Assign regime type based on score conditions for 2022
data_for_freq$Regime_Type_2022 <- ifelse(data_for_freq$X2022 >= 8.01 & data_for_freq$X2022 <= 10, "Full democracies",
ifelse(data_for_freq$X2022 >= 6.01 & data_for_freq$X2022 <= 8, "Flawed democracies",
ifelse(data_for_freq$X2022 >= 4.01 & data_for_freq$X2022 <= 6, "Hybrid regimes",
ifelse(data_for_freq$X2022 >= 0 & data_for_freq$X2022 <= 4, "Authoritarian regimes", NA))))
# Create a contingency table of regime types for 2006 and 2022
data_for_freqs <- table(data_for_freq$Regime_Type_2006 ,data_for_freq$Regime_Type_2022)
# Compute transition probabilities
transition_probabilities <- prop.table(data_for_freqs, margin = 1)
transition_probabilities
##
## Authoritarian regimes Flawed democracies
## Authoritarian regimes 0.83636364 0.00000000
## Flawed democracies 0.01886792 0.69811321
## Full democracies 0.00000000 0.23076923
## Hybrid regimes 0.36363636 0.15151515
##
## Full democracies Hybrid regimes
## Authoritarian regimes 0.00000000 0.16363636
## Flawed democracies 0.07547170 0.20754717
## Full democracies 0.76923077 0.00000000
## Hybrid regimes 0.00000000 0.48484848
# Define the categories for regime types
regime_categories <- c("Full democracy", "Flawed democracy", "Hybrid regime", "Authoritarian")
# Create a matrix of transition probabilities with row and column names
prob_table <- matrix(transition_probabilities, nrow = 4, byrow = TRUE, dimnames = list(regime_categories, regime_categories))
# Create a heatmap to visualize the regime transition probabilities
heatmap(prob_table, col = colorRampPalette(c("white", "green"))(20), main = "Regime Transition Probabilities")
This table shows the probabilities of transitioning from one regime type to another. Each row represents the starting regime type, and each column represents the ending regime type. The values in the table represent the probabilities of transitioning from the starting regime type to the ending regime type. For example:
On the heat map we can see that a Hybrid regime has a probability of 0 to become an authoritarian regime. But a Flawed democracy has some chances to become authoritarian.
5.a.
gdp_url<- read_html("https://en.wikipedia.org/wiki/List_of_countries_by_GDP_(PPP)_per_capita")
population_size_url<- read_html("https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_population")
incarnation_rates_url<- read_html("https://en.wikipedia.org/wiki/List_of_countries_by_incarceration_rate")
area_url<- read_html("https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_area")
# Extracting all the tables from HTML data
gdp.tables = html_nodes(gdp_url, "table")
gdp_table <- as.data.frame(html_table(gdp.tables[2], fill = TRUE))
pop.tables = html_nodes(population_size_url, "table")
population_size_table <- as.data.frame(html_table(pop.tables[2], fill = TRUE))
incar.tables = html_nodes(incarnation_rates_url, "table")
incarnation_rate_table <- as.data.frame(html_table(incar.tables[2], fill = TRUE))
area.tables = html_nodes(area_url, "table")
area_table <- as.data.frame(html_table(area.tables[2], fill = TRUE))
# Renaming columns of the tables so its more intuative to use
colnames(gdp_table)[colnames(gdp_table) == "Country.Territory"] <- "Country"
colnames(population_size_table)[colnames(population_size_table) == "Country...Dependency"] <- "Country"
colnames(incarnation_rate_table)[colnames(incarnation_rate_table) == "Location"] <- "Country"
colnames(area_table)[colnames(area_table) == "Country...Dependency"] <- "Country"
# Cleaning country names in the GDP table
gdp_table$Country <- gsub("\\*$", "", gdp_table$Country)
gdp_table$Country <- gsub("\\ ", "", gdp_table$Country)
# Merging tables based on the "Country" column
joined_table <- merge(list_by_country, gdp_table, by = "Country", all.x = TRUE)
joined_table <- merge(joined_table, population_size_table, by = "Country", all.x = TRUE)
joined_table <- merge(joined_table, incarnation_rate_table, by = "Country", all.x = TRUE)
joined_table <- merge(joined_table, area_table, by = "Country", all.x = TRUE)
joined_table <- as.data.frame(joined_table)
# Displaying the head of the joined table
head(joined_table)
## Country Region.x X2022.rank Regime.type X2022
## 1 Afghanistan Asia and Australasia 167 Authoritarian 0.32
## 2 Albania Central and Eastern Europe 64 Flawed democracy 6.41
## 3 Algeria Middle East and North Africa 113 Authoritarian 3.66
## 4 Angola Sub-Saharan Africa 109 Authoritarian 3.96
## 5 Argentina Latin America and the Caribbean 50 Flawed democracy 6.85
## 6 Armenia Central and Eastern Europe 82 Hybrid regime 5.63
## X2021 X2020 X2019 X2018 X2017 X2016 X2015 X2014 X2013 X2012 X2011 X2010 X2008
## 1 0.32 2.85 2.85 2.97 2.55 2.55 2.77 2.77 2.48 2.48 2.48 2.48 3.02
## 2 6.11 6.08 5.89 5.98 5.98 5.91 5.91 5.67 5.67 5.67 5.81 5.86 5.91
## 3 3.77 3.77 4.01 3.50 3.56 3.56 3.95 3.83 3.83 3.83 3.44 3.44 3.32
## 4 3.37 3.66 3.72 3.62 3.62 3.40 3.35 3.35 3.35 3.35 3.32 3.32 3.35
## 5 6.81 6.95 7.02 7.02 6.96 6.96 7.02 6.84 6.84 6.84 6.84 6.84 6.63
## 6 5.49 5.35 5.54 4.79 4.11 3.88 4.00 4.13 4.02 4.09 4.09 4.09 4.09
## X2006 min_index max_index UN.Region IMF.5..6. IMF.5..6..1 World.Bank.7.
## 1 3.06 0.32 3.06 Asia 2,456 2020 1,666
## 2 5.91 5.67 6.11 Europe 19,029 2023 15,709
## 3 3.17 3.17 4.01 Africa 13,507 2023 12,128
## 4 2.41 2.41 3.72 Africa 7,222 2023 6,491
## 5 6.63 6.63 7.02 Americas 27,261 2023 23,650
## 6 4.15 3.88 5.54 Asia 19,489 2023 15,593
## World.Bank.7..1 CIA.8..9..10. CIA.8..9..10..1 Rank.x Population Population.1
## 1 2021 1,500 2021 46 32,890,171 0.409%
## 2 2021 14,500 2021 137 2,793,592 0.0348%
## 3 2021 11,000 2021 32 45,400,000 0.565%
## 4 2021 5,900 2021 44 33,086,278 0.412%
## 5 2021 21,500 2021 31 46,044,703 0.573%
## 6 2021 14,200 2021 134 2,981,200 0.0371%
## Date Source..official.or.from.the.United.Nations. Notes.x Region.y
## 1 1 Jul 2020 Official estimate[48] <NA>
## 2 1 Jan 2022 Official estimate[134] <NA>
## 3 1 Jan 2022 Official estimate[35] Africa
## 4 30 Jun 2022 National annual projection[46] Africa
## 5 18 May 2022 2022 census preliminary result[34] <NA>
## 6 1 Jan 2023 National quarterly estimate[131] <NA>
## Count.2. Rate.per.100.000..3. Male.....a. Female.....4. National.....b.
## 1 <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA>
## 3 94,749 217 98.5 1.5 96.2
## 4 24,966 79 97.3 2.7 —
## 5 <NA> <NA> <NA> <NA> <NA>
## 6 <NA> <NA> <NA> <NA> <NA>
## Foreign.....5. Occupancy.....6. Remand.....7. Rank.y Totalin.km2..mi2.
## 1 <NA> <NA> <NA> 40 652,867 (252,073)
## 2 <NA> <NA> <NA> 140 28,748 (11,100)
## 3 3.8 89.3 12.0 10 2,381,741 (919,595)
## 4 — 110.8 45.8 22 1,246,700 (481,400)
## 5 <NA> <NA> <NA> 8 2,780,400 (1,073,500)
## 6 <NA> <NA> <NA> 138 29,743 (11,484)
## Landin.km2..mi2. Waterin.km2..mi2. X.water Notes.y
## 1 652,867 (252,073) 0 (0) 0 <NA>
## 2 27,398 (10,578) 1,350 (520) 4.7
## 3 2,381,741 (919,595) 0 (0) 0 [Note 13]
## 4 1,246,700 (481,400) 0 (0) 0
## 5 2,736,690 (1,056,640) 43,710 (16,880) 1.6 [Note 11]
## 6 28,342 (10,943) 1,401 (541) 4.7
We can see the first 5 countries in this new joined table with all the necessaries information added, like the rate of incarnation or the number of population for every countries. For example the top country is Afghanistan and we can see its rank in 2022, its regime type and the rates of democracy between the years 2006 to 2022.
5.b.
# Removing commas from all columns of the joined table
joined_table <- data.frame(lapply(joined_table, function(x) gsub(",", "", x)))
gdp_table <- data.frame(lapply(gdp_table, function(x) gsub(",", "", x)))
# Renaming specific columns in the joined table
colnames(joined_table)[colnames(joined_table) == "IMF.5..6."] <- "IMF.est"
colnames(joined_table)[colnames(joined_table) == "IMF.5..6..1"] <- "IMF.year"
colnames(joined_table)[colnames(joined_table) == "World.Bank.7."] <- "World.bank.est"
colnames(joined_table)[colnames(joined_table) == "World.Bank.7..1"] <- "World.bank.year"
colnames(joined_table)[colnames(joined_table) == "CIA.8..9..10."] <- "CIA.est"
colnames(joined_table)[colnames(joined_table) == "CIA.8..9..10..1"] <- "CIA.year"
# Creating a new data frame with selected columns and removing rows with NA values
data <- na.omit(joined_table[, c("CIA.est", "X2022")])
# Converting selected columns to numeric
data$CIA.est <- as.numeric(data$CIA.est)
data$X2022 <- as.numeric(data$X2022)
# Performing linear regression using CIA.est as the response variable and X2022 as the predictor
reg_mod_cia <- lm(CIA.est ~ X2022, data = data)
# Displaying the summary of the linear regression model
summary(reg_mod_cia)
##
## Call:
## lm(formula = CIA.est ~ X2022, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -27148 -11701 -3187 6754 80120
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6166.1 3551.0 -1.736 0.0844 .
## X2022 5152.2 609.2 8.457 1.48e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 18440 on 163 degrees of freedom
## Multiple R-squared: 0.305, Adjusted R-squared: 0.3007
## F-statistic: 71.52 on 1 and 163 DF, p-value: 1.481e-14
# Plotting the relationship between Democracy Index (X2022) and GDP (PPP) per capita (CIA.est)
plot(data$X2022, data$CIA.est, xlab = "Democracy Index", ylab = "GDP (PPP) per capita")
abline(reg_mod_cia, col = "red")
# Renaming specific columns in the joined table
colnames(joined_table)[colnames(joined_table) == "Rate.per.100.000..3."] <- "Rate.100000"
data1 <- na.omit(joined_table[, c("Rate.100000", "X2022")])
# Converting selected columns to numeric
data1$Rate.100000 <- as.numeric(data1$Rate.100000)
data1$X2022 <- as.numeric(data1$X2022)
# Performing linear regression using Rate.100000 as the response variable and X2022 as the predictor
reg_mod_inca <- lm(Rate.100000 ~ X2022, data = data1)
# Displaying the summary of the linear regression model for Incarceration Rate
summary(reg_mod_inca)
##
## Call:
## lm(formula = Rate.100000 ~ X2022, data = data1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -124.18 -85.37 -36.29 38.46 435.67
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 154.699 36.734 4.211 9.68e-05 ***
## X2022 -3.345 7.320 -0.457 0.65
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 122.7 on 54 degrees of freedom
## Multiple R-squared: 0.003852, Adjusted R-squared: -0.0146
## F-statistic: 0.2088 on 1 and 54 DF, p-value: 0.6495
# Plotting the relationship between Democracy Index (X2022) and Incarceration Rate (Rate.100000)
plot(data1$X2022, data1$Rate.100000, xlab = "Democracy Index", ylab = "Incarnation Rate")
abline(reg_mod_inca, col="red")
On the first plot, that analyze the connection between the democracy index and GDP (PPP) per capita, we can see a strong connection and correlation, the regression line is clear and logical, there is some exceptions but the main are linked. We can conclude that the more the democracy index of a country is high, the more the GDP per capita will also be high. On a second hand, the second plot that analyse the connection between the democracy index and the incarnation rate doesn’t show a real connection. The regression line is almost horizontal, the points are scattered and not really explaining the regression line.
6.a.
# Find countries that appear more than once
duplicate_countries <- joined_table$Country[duplicated(joined_table$Country)]
# Remove rows with duplicate countries
joined_table <- joined_table[!(joined_table$Country %in% duplicate_countries), ]
# remove all the NA's from the CIA est and convert the data into numeric
x <- as.numeric(na.omit(joined_table$CIA.est))
# Calculate de CDF of X
ecdf_X <- ecdf(x)
plot(ecdf_X, xlab ="x", ylab="ECDF", main= "Empirical CDF", col.main="blue", pch = 16)
On this plot we can see the cdf of gdp per capita of a randomly selected group of a 1,000 countries out of all the counties data. the X axis on the graph represents the GDP per capita, and the Y axis represents the Empirical cdf. We can also see where x is located on that scale, where x is also a randomly selected country.
6.b.
# Convert Population and CIA.est to numeric type
joined_table$Population <- as.numeric(joined_table$Population)
joined_table$CIA.est <- as.numeric(joined_table$CIA.est)
# Select the relevant columns and remove missing values
gdp_pers <- na.omit(joined_table %>% select(Country, CIA.est, Population))
# Round the Population column to whole numbers
gdp_pers$Population <- round(gdp_pers$Population)
# Calculate weights based on population
gdp_pers <- gdp_pers %>% mutate(Weights = gdp_pers$Population / sum(gdp_pers$Population))
# Calculate empirical weighted cumulative distribution function (EW-CDF)
ecdf_Y <- ewcdf(gdp_pers$CIA.est, weights = gdp_pers$Weights)
# Plot the empirical weighted cumulative distribution function (EW-CDF)
plot(ecdf_Y, xlab = "GDP (PPP) per capita in Int$", ylab = "EWCDF",
main = "EW-CDF of GDP per capita of a randomly selected person",
sub = "(Weighted by Population of a person's country)",
col.main = "blue", col.sub = "blue", pch = 16, verticals = TRUE, do.points = FALSE)
In this graph we can see that the majority of the population (0.8) have a 20,000 gdp and under, while the rest (0.2) is located between 20k and 60k.
6.c.
# Change column name for better readability
colnames(joined_table)[colnames(joined_table) == "Landin.km2..mi2."] <- "LandArea"
# Extract land area values and convert to numeric type
land_area <- as.numeric(joined_table$LandArea)
## Warning: NAs introduced by coercion
# Remove any parentheses and their contents from land area values
joined_table$LandArea <- gsub("\\s*\\(.*\\)", "", joined_table$LandArea)
# Convert land area values to numeric type
joined_table$LandArea <- as.numeric(joined_table$LandArea)
# Select relevant columns for GDP per capita calculation and remove missing values
gdp_area <- na.omit(joined_table %>% select(Country, CIA.est, LandArea))
# Convert land area values to numeric and round to nearest whole number
gdp_area$LandArea <- as.numeric(gdp_area$LandArea)
gdp_area$LandArea <- round(gdp_area$LandArea)
# Calculate weights based on land area
gdp_area <- gdp_area %>% mutate(Weights = gdp_area$LandArea / sum(gdp_area$LandArea))
# Calculate empirical weighted cumulative distribution function (EW-CDF)
ecdf_Z <- ewcdf(gdp_area$CIA.est, weights = gdp_area$Weights)
# Plot the empirical weighted cumulative distribution function (EW-CDF)
plot(ecdf_Z, xlab = "GDP (PPP) per capita in Int$", ylab = "EWCDF",
main = "EW-CDF of GDP per capita of a randomly selected person",
sub = "(Weighted by land-area of a person's country)",
col.main = "blue", col.sub = "blue", pch = 16, verticals = TRUE, do.points = FALSE)
# Calculate median and percentiles for X, Y, and Z
median_X <- quantile(ecdf_X,probs = 0.5)
median_Y <- quantile(ecdf_Y,probs = 0.5)
median_Z <- quantile(ecdf_Z,probs = 0.5)
percentile_25_X <- quantile(ecdf_X,probs = 0.25)
percentile_25_Y <- quantile(ecdf_Y,probs = 0.25)
percentile_25_Z <- quantile(ecdf_Z,probs = 0.25)
percentile_75_X <- quantile(ecdf_X,probs = 0.75)
percentile_75_Y <- quantile(ecdf_Y,probs = 0.75)
percentile_75_Z <- quantile(ecdf_Z,probs = 0.75)
# Add vertical lines at the quantiles
abline(v = median_Z, col = "red", lty = 2) # 25% quantile
abline(v = percentile_25_Z, col = "green", lty = 2) # 50% quantile
abline(v = percentile_75_Z, col = "blue", lty = 2) # 75% quantile
# Create a data frame for comparison
comparison <- data.frame(Variable = c("X", "Y", "Z"),
Median = c(median_X, median_Y, median_Z),
Percentile_25 = c(percentile_25_X, percentile_25_Y, percentile_25_Z),
Percentile_75 = c(percentile_75_X, percentile_75_Y, percentile_75_Z))
comparison
## Variable Median Percentile_25 Percentile_75
## 1 X 13400 4900 32100
## 2 Y 11900 6600 17600
## 3 Z 17600 11000 41900
In this graph we can see that there is a better scatter of the gdp for a randonly selected person by land area : majority of the population (0.6) have a 30,000 gdp and under, and the (0.4) is located between 30k and 65k.
The median of X is slightly higher than Y and Z .It might be because dataset for X represents the GDP per capita values of different countries. This means that it includes a wide range of values from various countries, including both high-income and low-income countries. As a result, the dataset for X captures the economic diversity among countries, which can contribute to a higher median compared to the dataset for Y or Z.
The 25th and the 75th percentile of X is slightly higher than Y and Z. It might be because the dataset for X represents GDP per capita values of different countries. Since countries have varying levels of economic development, including both high-income and low-income countries, the dataset for X captures a wide range of GDP per capita values. As a result, the 25th and the 75th percentile of X are expected to be higher due to the inclusion of countries with higher GDP per capita values.
7
#select the country names and from year 2006 till 2022
avg_table <- select(list_by_country,Country,c(5:19))
#create a new empty column to store the average for each country
avg_table$avg <- NA
row_sums <- rowSums(avg_table[, c(2:16)])
#enter and calculate averages for each country
avg_table$avg <- row_sums/15
world_map <- joinCountryData2Map(avg_table, joinCode = "NAME", nameJoinColumn = "Country")
## 165 codes from your data successfully matched countries in the map
## 2 codes from your data failed to match with a country code in the map
## 78 codes from the map weren't represented in your data
#The world map with color based on average value
mapCountryData(world_map, nameColumnToPlot = "avg", mapTitle = "Average Value",
catMethod = "fixedWidth", numCats = 10, missingCountryCol = "white", addLegend = TRUE, oceanCol = "lightblue")
table_2006 <- select(list_by_country,Country,c(19))
mapCountryData(world_map, nameColumnToPlot = "X2006", mapTitle = "2006 Values",
catMethod = "fixedWidth", numCats = 10, missingCountryCol = "white", addLegend = TRUE, oceanCol = "lightblue")
table_2022 <- select(list_by_country,Country,c(5))
table_2006 <- select(list_by_country,Country,c(19))
mapCountryData(world_map, nameColumnToPlot = "X2022", mapTitle = "2022 Values",
catMethod = "fixedWidth", numCats = 10, missingCountryCol = "white", addLegend = TRUE, oceanCol = "lightblue")
In 2006 Africa was mostly low democratic with some exceptions, and Russia was orange - on the middle of the scale. America and Europe and Australia are very democratic.
In 2022 Africa was has become more democratic (more orange countries) with some exceptions, and Russia has become yellow - less democratic than it was. America and Europe and Australia stayed almost the same.
We calculated the average democracy score between 2006 and 2022 for each country. then we created a scale representing the scores- from the lowest average (1.06) and colored it close to white, the higest (9.83) as red. the countries received colors based on the scale colors (white to red). By reviewing the results we can say that countries on the West side of the map are the most democratic. Countries in the South-EAST edge and Europe are very democratic as well. On the other hand, Africa, the middle east and the East-South countries have low democratic scores (with exceptions).
8.a.
#Joining the components table in a new variable:
joined_comp <- merge(components,joined_table,by = "Country")
#changing the titles of 5 columns that we'll need :
colnames(joined_comp)[colnames(joined_comp) == "Elec.toral.pro.cessand.plura.lism"] <- "Electoral_processand_pluralism"
colnames(joined_comp)[colnames(joined_comp) == "Func.tioningof.govern.ment"] <- "Functioning_of_government"
colnames(joined_comp)[colnames(joined_comp) == "Poli.ticalpartici.pation"] <- "Political_participation"
colnames(joined_comp)[colnames(joined_comp) == "Poli.ticalcul.ture"] <- "Political_culture"
colnames(joined_comp)[colnames(joined_comp) == "Civilliber.ties"] <- "Civil_liberties"
#Show the first 5
head(joined_comp,5)
## Country Rank
## 1 Afghanistan 167
## 2 Albania 64
## 3 Algeria 113
## 4 Angola 109
## 5 Argentina 50
## .mw.parser.output..tooltip.dotted.border.bottom.1px.dotted.cursor.help.Δ.Rank
## 1
## 2 4
## 3
## 4 13
## 5
## Regime.type.x Overall.score Δ.Score Electoral_processand_pluralism
## 1 Authoritarian 0.32 0.00
## 2 Flawed democracy 6.41 0.30 7.00
## 3 Authoritarian 3.66 0.11 3.08
## 4 Authoritarian 3.96 0.59 4.50
## 5 Flawed democracy 6.85 9.17
## Functioning_of_government Political_participation Political_culture
## 1 0.07 0.00 1.25
## 2 6.43 5.00 6.25
## 3 2.50 3.89 5.00
## 4 3.21 4.44 5.00
## 5 5.00 7.78 4.38
## Civil_liberties Region.x X2022.rank Regime.type.y
## 1 0.29 Asia and Australasia 167 Authoritarian
## 2 7.35 Central and Eastern Europe 64 Flawed democracy
## 3 3.82 Middle East and North Africa 113 Authoritarian
## 4 2.65 Sub-Saharan Africa 109 Authoritarian
## 5 7.94 Latin America and the Caribbean 50 Flawed democracy
## X2022 X2021 X2020 X2019 X2018 X2017 X2016 X2015 X2014 X2013 X2012 X2011 X2010
## 1 0.32 0.32 2.85 2.85 2.97 2.55 2.55 2.77 2.77 2.48 2.48 2.48 2.48
## 2 6.41 6.11 6.08 5.89 5.98 5.98 5.91 5.91 5.67 5.67 5.67 5.81 5.86
## 3 3.66 3.77 3.77 4.01 3.5 3.56 3.56 3.95 3.83 3.83 3.83 3.44 3.44
## 4 3.96 3.37 3.66 3.72 3.62 3.62 3.4 3.35 3.35 3.35 3.35 3.32 3.32
## 5 6.85 6.81 6.95 7.02 7.02 6.96 6.96 7.02 6.84 6.84 6.84 6.84 6.84
## X2008 X2006 min_index max_index UN.Region IMF.est IMF.year World.bank.est
## 1 3.02 3.06 0.32 3.06 Asia 2456 2020 1666
## 2 5.91 5.91 5.67 6.11 Europe 19029 2023 15709
## 3 3.32 3.17 3.17 4.01 Africa 13507 2023 12128
## 4 3.35 2.41 2.41 3.72 Africa 7222 2023 6491
## 5 6.63 6.63 6.63 7.02 Americas 27261 2023 23650
## World.bank.year CIA.est CIA.year Rank.x Population Population.1 Date
## 1 2021 1500 2021 46 32890171 0.409% 1 Jul 2020
## 2 2021 14500 2021 137 2793592 0.0348% 1 Jan 2022
## 3 2021 11000 2021 32 45400000 0.565% 1 Jan 2022
## 4 2021 5900 2021 44 33086278 0.412% 30 Jun 2022
## 5 2021 21500 2021 31 46044703 0.573% 18 May 2022
## Source..official.or.from.the.United.Nations. Notes.x Region.y Count.2.
## 1 Official estimate[48] <NA> <NA>
## 2 Official estimate[134] <NA> <NA>
## 3 Official estimate[35] Africa 94749
## 4 National annual projection[46] Africa 24966
## 5 2022 census preliminary result[34] <NA> <NA>
## Rate.100000 Male.....a. Female.....4. National.....b. Foreign.....5.
## 1 <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA>
## 3 217 98.5 1.5 96.2 3.8
## 4 79 97.3 2.7 — —
## 5 <NA> <NA> <NA> <NA> <NA>
## Occupancy.....6. Remand.....7. Rank.y Totalin.km2..mi2. LandArea
## 1 <NA> <NA> 40 652867 (252073) 652867
## 2 <NA> <NA> 140 28748 (11100) 27398
## 3 89.3 12.0 10 2381741 (919595) 2381741
## 4 110.8 45.8 22 1246700 (481400) 1246700
## 5 <NA> <NA> 8 2780400 (1073500) 2736690
## Waterin.km2..mi2. X.water Notes.y
## 1 0 (0) 0 <NA>
## 2 1350 (520) 4.7
## 3 0 (0) 0 [Note 13]
## 4 0 (0) 0
## 5 43710 (16880) 1.6 [Note 11]
#calculate the correlations between those
selected_columns <- select(joined_comp,Electoral_processand_pluralism, Functioning_of_government,
Political_participation, Political_culture, Civil_liberties)
selected_columns <- as.data.frame(lapply(selected_columns, as.numeric))
cor_matrix <- cor(selected_columns)
corrplot(cor_matrix, method = "color", type = "upper",
tl.col = "black", tl.srt = 45)
In the first data table we have presented the top five rows of the new merged data.
In the heatmap of the five democracy elements correlations we can see that the 2 pairs - Electoral process and pluralism (EEP) and Civil liberties (CL), Functioning of government(FG) and Civil liberties,have the highest correlations (above 0.8). We can say that the following pairs have highest correlations : (EPP ,FG),(EPP ,Political Participation(PP)), (PP, CL). The lowest pairs are: (EPP,Political culture) and (Political Participation,Political culture).
Generally speaking, if the color is darker than it has a stronger correlation
8.b.
joined_comp <- joined_comp[complete.cases(joined_comp$CIA.est), ]
joined_comp$Electoral_processand_pluralism <- as.numeric(joined_comp$Electoral_processand_pluralism)
joined_comp$Functioning_of_government <- as.numeric(joined_comp$Functioning_of_government)
joined_comp$Political_participation <- as.numeric(joined_comp$Political_participation)
joined_comp$Political_culture <- as.numeric(joined_comp$Political_culture)
joined_comp$Civil_liberties <- as.numeric(joined_comp$Civil_liberties)
model <- lm(joined_comp$CIA.est ~ Electoral_processand_pluralism + Functioning_of_government +
Political_participation + Political_culture + Civil_liberties, data = joined_comp)
# Show the summary of the regression analysis
summary(model)
##
## Call:
## lm(formula = joined_comp$CIA.est ~ Electoral_processand_pluralism +
## Functioning_of_government + Political_participation + Political_culture +
## Civil_liberties, data = joined_comp)
##
## Residuals:
## Min 1Q Median 3Q Max
## -33113 -9146 -2288 7451 67080
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -15562.8 4635.4 -3.357 0.000985 ***
## Electoral_processand_pluralism -2969.7 893.8 -3.322 0.001108 **
## Functioning_of_government 4857.7 1040.2 4.670 6.37e-06 ***
## Political_participation 624.1 1094.7 0.570 0.569421
## Political_culture 2668.6 929.5 2.871 0.004647 **
## Civil_liberties 2367.4 1327.2 1.784 0.076384 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 16470 on 159 degrees of freedom
## Multiple R-squared: 0.459, Adjusted R-squared: 0.442
## F-statistic: 26.98 on 5 and 159 DF, p-value: < 2.2e-16
# Extract the p-values for each coefficient
p_values <- summary(model)$coefficients[, 4]
# Identify the significant coefficients at alpha = 0.01
significant_coefficients <- names(p_values[p_values < 0.01])
# Print the significant coefficients
cat("Significant coefficients at alpha = 0.01:\n")
## Significant coefficients at alpha = 0.01:
cat(significant_coefficients, sep = ", ")
## (Intercept), Electoral_processand_pluralism, Functioning_of_government, Political_culture
# Calculate the residuals
residuals <- residuals(model)
# Combine the residuals with the country names
residuals_df <- data.frame(Country = joined_comp$Country, Residuals = residuals)
# Sort the dataframe by the absolute value of residuals in descending order
residuals_df <- residuals_df[order(abs(residuals_df$Residuals), decreasing = TRUE), ]
# Display the top 5 countries with the highest residuals
cat("Countries with the highest residuals:\n")
## Countries with the highest residuals:
head(residuals_df, 5)
## Country Residuals
## 91 Luxembourg 67079.56
## 126 Qatar 65918.29
## 135 Singapore 59621.97
## 73 Ireland 54402.18
## 158 United Arab Emirates 42450.02
# Display the bottom 5 countries with the lowest residuals
cat("Countries with the lowest residuals:\n")
## Countries with the lowest residuals:
tail(residuals_df, 5)
## Country Residuals
## 113 North Korea 740.521309
## 79 Jordan -642.005483
## 165 Yemen 208.853652
## 147 Taiwan 102.620253
## 28 Chad 6.765322
The top five highest residuals are : Luxembourg, Qatar, Singapore, Ireland and United Arab Emirates.
The top five lowest residuals are : North Korea,Jordan,Yemen,Taiwan and Chad
Other factors contributing to their high or low GDP per capita could include:
Economic policies and governance Natural resources and their management Political stability Education and human capital Infrastructure development Trade and international relations Technological advancements Income inequality and distribution Access to healthcare and social services Environmental factors